1. |
- Holsmark, Rickard, 1970-
(författare)
-
Deadlock Free Routing in Mesh Networks on Chip with Regions
- 2009
-
Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
- There is a seemingly endless miniaturization of electronic components, which has enabled designers to build sophisticated computing structureson silicon chips. Consequently, electronic systems are continuously improving with new and more advanced functionalities. Design complexity ofthese Systems on Chip (SoC) is reduced by the use of pre-designed cores. However, several problems related to the interconnection of coresremain. Network on Chip (NoC) is a new SoC design paradigm, which targets the interconnect problems using classical network concepts. Still,SoC cores show large variance in size and functionality, whereas several NoC benefits relate to regularity and homogeneity. This thesis studies some network aspects which are characteristic to NoC systems. One is the issue of area wastage in NoC due to cores of varioussizes. We elaborate on using oversized regions in regular mesh NoC and identify several new design possibilities. Adverse effects of regions oncommunication are outlined and evaluated by simulation. Deadlock freedom is an important region issue, since it affects both the usability and performance of routing algorithms. The concept of faultyblocks, used in deadlock free fault-tolerant routing algorithms has similarities with rectangular regions. We have improved and adopted one suchalgorithm to provide deadlock free routing in NoC with regions. This work also offers a methodology for designing topology agnostic, deadlockfree, highly adaptive application specific routing algorithms. The methodology exploits information about communication among tasks of anapplication. This is used in the analysis of deadlock freedom, such that fewer deadlock preventing routing restrictions are required. A comparative study of the two proposed routing algorithms shows that the application specific algorithm gives significantly higher performance.But, the fault-tolerant algorithm may be preferred for systems requiring support for general communication. Several extensions to our work areproposed, for example in areas such as core mapping and efficient routing algorithms. The region concept can be extended for supporting reuse ofa pre-designed NoC as a component in a larger hierarchical NoC.
|
|
2. |
- Liu, Ming, 1982-
(författare)
-
A High-end Reconfigurable Computation Platform for Particle Physics Experiments
- 2008
-
Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
- Modern nuclear and particle physics experiments run at a very high reaction rate and are able to deliver a data rate of up to hundred GBytes/s. This data rate is far beyond the storage and on-line analysis capability. Fortunately physicists have only interest in a very small proportion among the huge amounts of data. Therefore in order to select the interesting data and reject the background by sophisticated pattern recognition processing, it is essential to realize an efficient data acquisition and trigger system which results in a reduced data rate by several orders of magnitude. Motivated by the requirements from multiple experiment applications, we are developing a high-end reconfigurable computation platform for data acquisition and triggering. The system consists of a scalable number of compute nodes, which are fully interconnected by high-speed communication channels. Each compute node features 5 Xilinx Virtex-4 FX60 FPGAs and up to 10 GBytesDDR2 memory. A hardware/software co-design approach is proposed to develop custom applications on the platform, partitioning performance-critical calculation to the FPGA hardware fabric while leaving flexible and slow controls to the embedded CPU plus the operating system. The system is expected to be high-performance and general-purpose for various applications especially in the physics experiment domain. As a case study, the particle track reconstruction algorithm for HADES has been developed and implemented on the computation platform in the format of processing engines. The Tracking Processing Unit (TPU) recognizes peak bins on the projection plane and reconstructs particle tracks in realtime. Implementation results demonstrate its acceptable resource utilization and the feasibility to implement the module together with the sys-tem design on the FPGA. Experimental results show that the online track reconstruction computation achieves 10.8 - 24.3 times performance acceleration per TPU module when compared to the software solution on a Xeon2.4 GHz commodity server.
|
|
3. |
- She, Huimin, 1982-
(författare)
-
Network-Calculus-based Performance Analysis for Wireless Sensor Networks
- 2009
-
Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
- Recently, wireless sensor network (WSN) has become a promising technologywith a wide range of applications such as supply chain monitoringand environment surveillance. It is typically composed of multiple tiny devicesequipped with limited sensing, computing and wireless communicationcapabilities. Design of such networks presents several technique challengeswhile dealing with various requirements and diverse constraints. Performanceanalysis techniques are required to provide insight on design parametersand system behaviors. Based on network calculus, we present a deterministic analysis methodfor evaluating the worst-case delay and buffer cost of sensor networks. Tothis end, three general traffic flow operators are proposed and their delayand buffer bounds are derived. These operators can be used in combinationto model any complex traffic flowing scenarios. Furthermore, the methodintegrates a variable duty cycle to allow the sensor nodes to operate at lowrates thus saving power. In an attempt to balance traffic load and improveresource utilization and performance, traffic splitting mechanisms areintroduced for mesh sensor networks. Based on network calculus, the delayand buffer bounds are derived in non-splitting and splitting scenarios.In addition, analysis of traffic splitting mechanisms are extended to sensornetworks with general topologies. To provide reliable data delivery in sensornetworks, retransmission has been adopted as one of the most popularschemes. We propose an analytical method to evaluate the maximum datatransmission delay and energy consumption of two types of retransmissionschemes: hop-by-hop retransmission and end-to-end retransmission. We perform a case study of using sensor networks for a fresh food trackingsystem. Several experiments are carried out in the Omnet++ simulationenvironment. In order to validate the tightness of the two bounds obtainedby the analysis method, the simulation results and analytical results arecompared in the chain and mesh scenarios with various input traffic loads.From the results, we show that the analytic bounds are correct and tight.Therefore, network calculus is useful and accurate for performance analysisof wireless sensor network.
|
|
4. |
- Zhu, Jun, 1976-
(författare)
-
Energy and Design Cost Efficiency for Streaming Applications on Systems-on-Chip
- 2009
-
Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
- With the increasing capacity of today's integrated circuits, a number ofheterogeneous system-on-chip (SoC) architectures in embedded systemshave been proposed. In order to achieve energy and design cost efficientstreaming applications on these systems, new design space explorationframeworks and performance analysis approaches are required. Thisthesis considers three state-of-the-art SoCs architectures, i.e., themulti-processor SoCs (MPSoCs) with network-on-chip (NoC) communication,the hybrid CPU/FPGA architectures, and the run-time reconfigurable (RTR)FPGAs. The main topic of the author?s research is to model and capturethe application scheduling, architecture customization, and bufferdimensioning problems, according to the real-time requirement. Sincethese problems are NP-complete, heuristic algorithms and constraintprogramming solver are used to compute a solution.For NoC communication based MPSoCs, an approach to optimize thereal-time streaming applications with customized processorvoltage-frequency levels and memory sizes is presented. A multi-clockedsynchronous model of computation (MoC) framework is proposed inheterogeneous timing analysis and energy estimation. Using heuristicsearching (i.e., greedy and taboo search), the experiments show anenergy reduction (up to 21%) without any loss in application throughputcompared with an ad-hoc approach.On hybrid CPU/FPGA architectures, the buffer minimization scheduling ofreal-time streaming applications is addressed. Based on event models,the problem has been formalized decoratively as constraint basescheduling, and solved by public domain constraint solver Gecode.Compared with traditional PAPS method, the proposed method needssignificantly smaller buffers (2.4% of PAPS in the best case), whilehigh throughput guarantees can still be achieved.Furthermore, a novel compile-time analysis approach based on iterativetiming phases is proposed for run-time reconfigurations in adaptivereal-time streaming applications on RTR FPGAs. Finally, thereconfigurations analysis and design trade-offs analysis capabilities ofthe proposed framework have been exemplified with experiments on bothexample and industrial applications.
|
|